Hello, Everyone! This happens to be my first-ever blog post for which I am very excited. Writing and Setting up a blog was something that was in my mind for a while and finally after lots of failed attempts and sometimes excusable procrastination attempts I managed to write one.
This blog post will be about very important topics, which I believe shares a behemoth of an impact in creating a difference between a good analysis from a great one.
“Numbers have an important story to tell. They rely on you to give them a clear and convincing voice.” - Stephen Few
The post will try to over these aspects - An attempt to understand the visualization process put forward by Ben Fry - Visualizing Airports locations around the world - Visual Study of Flight Routes in-around India.
My interest in generative art, gravitated me towards Ben Fry initially to develop and understanding on how to present data in a more meaningful way. His book Visualizing Data provides a seven step process to create a narrative originated from data.
Let’s start covering each step one by one, and soon we will realize that these aren’t much of incremental steps as much as an intertwined processes which we return back to one after another.
Most of the data visualization originates from a question. It is important to have a question as it separates unnecessary constructs and provides a precise answer to the question.
Where are the Airports around the world located?
Lets acquire the data.
Before that I’ll load important packages.
library(XML)
library(ggplot2)
library(tidyr)
library(dplyr)
library(sp)
library(geosphere)
library('maps')
library('ggthemes')
library('plotly')
After a few minutes of google search, I happen to find a document that has all the coordinates of Airports locations around the globe. Let’s load that.
A_loc<-tbl_df(readLines("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat"))
head(A_loc)
## # A tibble: 6 x 1
## value
## <chr>
## 1 "1,\"Goroka Airport\",\"Goroka\",\"Papua New Guinea\",\"GKA\",\"AYGA\",-6.0
## 2 "2,\"Madang Airport\",\"Madang\",\"Papua New Guinea\",\"MAG\",\"AYMD\",-5.2
## 3 "3,\"Mount Hagen Kagamuga Airport\",\"Mount Hagen\",\"Papua New Guinea\",\"
## 4 "4,\"Nadzab Airport\",\"Nadzab\",\"Papua New Guinea\",\"LAE\",\"AYNZ\",-6.5
## 5 "5,\"Port Moresby Jacksons International Airport\",\"Port Moresby\",\"Papua
## 6 "6,\"Wewak International Airport\",\"Wewak\",\"Papua New Guinea\",\"WWK\",\
The data happens to be quite messy. But we managed to pass the first stage we have acquired the data. If we look carefully we can actually see the country, co-ordinates of the locations of the airports.
The next step will be to provide a structure to the acquired data, and to place them is the specific order that makes sense to us. An easy way to test that if our data has structure is to look at the parsed dataset and see if one can mentally “plot” something out of it.
I separated the entire dataset into required columns as mentioned in the documentations.
New_A_loc<-as.data.frame(sapply(A_loc, function(x) gsub("\"", "", x)))
New_A_loc<-separate(data = New_A_loc, col = value, into = c("Airport_id", "Name","City","Country","IATA","ICAO","Lat","Long","Alt","Timezone","DST","TZ","Type","Source"), sep = ",")
New_A_loc$Lat <- as.numeric(New_A_loc$Lat)
New_A_loc$Long <- as.numeric(New_A_loc$Long)
New_A_loc$Alt<-as.numeric(New_A_loc$Alt)
head(New_A_loc)
## Airport_id Name City
## 1 1 Goroka Airport Goroka
## 2 2 Madang Airport Madang
## 3 3 Mount Hagen Kagamuga Airport Mount Hagen
## 4 4 Nadzab Airport Nadzab
## 5 5 Port Moresby Jacksons International Airport Port Moresby
## 6 6 Wewak International Airport Wewak
## Country IATA ICAO Lat Long Alt Timezone DST
## 1 Papua New Guinea GKA AYGA -6.081690 145.392 5282 10 U
## 2 Papua New Guinea MAG AYMD -5.207080 145.789 20 10 U
## 3 Papua New Guinea HGU AYMH -5.826790 144.296 5388 10 U
## 4 Papua New Guinea LAE AYNZ -6.569803 146.726 239 10 U
## 5 Papua New Guinea POM AYPY -9.443380 147.220 146 10 U
## 6 Papua New Guinea WWK AYWK -3.583830 143.669 19 10 U
## TZ Type Source
## 1 Pacific/Port_Moresby airport OurAirports
## 2 Pacific/Port_Moresby airport OurAirports
## 3 Pacific/Port_Moresby airport OurAirports
## 4 Pacific/Port_Moresby airport OurAirports
## 5 Pacific/Port_Moresby airport OurAirports
## 6 Pacific/Port_Moresby airport OurAirports
Looks good! We provided specific structure, each Airport has now an ID, its geolocation, Name, City etc.
As mentioned before though those being the 7 steps, there is no rule which states that we specifically need to follow them in particular order or follow all of these. They provide a framework to work on number and provide them with a different persona.
As now we have a proper dataset, I will directly jump to the representation of those on a map.
This is where we will have our first visualization. We will form a scatter plot of airport location of a world map. Let’s work around that.
I prefer ggplot, over base plotting system. It’s more resourceful and functionally accessible.
We have Latitudes and Longitudes of each airport location. GGplot has a map function that lets us pinpoint coordinates precisely on a world map. We can further customize it to our setting too!
world <- ggplot() +
borders("world", colour = "#3e3e40", fill = "#3e3e40") +
theme(panel.background = element_rect(fill = "#252526", colour = "#252526"),panel.grid.minor = element_blank(),panel.grid.major = element_blank(),axis.title.y = element_blank(),axis.title.x = element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.text.y=element_blank(),axis.ticks.y=element_blank())
map <- world +
geom_point(aes(x = Long, y = Lat,
text = paste('City: ', City,
'<br /> Name : ', Name),
ID = Airport_id),
data = New_A_loc, colour = "#ebe6ed", alpha=1/4,size=0.4)+labs("Airport")
map
Looks good! So many Airports!!
Sometimes, it’s better to be precise with limited data, then to be inaccurate with a massive dataset. Filtering removes with is not useful or rather which doesn’t have much impact on the overall aspect of visualization. I will filter Airports, that is located in India.
Moving on, The last step is usually tricky and sometimes is quite underrated. But the interaction of a visual graphic has a profound impact. Letting user given a control over that visual functionality. It provides both depth and breath to the visual mechanics of a plot.
For that, we will use the Plotly package in R.
ggplotly(map, tooltip = c('text', 'ID'))
And finally, we have our first visualization. The aesthetics can be further improved I guess, I am bad with color, I can clearly see that.
I found Aaron Koblin’s Flight Patterns to be an amazing masterpiece, so simple yet so informative. So this is my short attempt to replicate his work using just the routes and not the entire plane schedule( I was not able to access those).
There is an important concept called the great circles which gives us a route for airlines to follow from one point to another. Mine is a step which incorporates mathematics and statistics to uncover more details about something.
Here I will only consider 2 main airports to observe route. Delhi and Mumbai
dat_point<-subset(New_A_loc,IATA==c("DEL","BOM"),select=c(Long,Lat))
gg4<-map+coord_cartesian(ylim = c(0,60),xlim=c(40,120))+geom_point(data=dat_point,aes(x=Long, y=Lat),color="#ebe6ed",alpha=1/5,size=2)
l<-tbl_df(gcIntermediate(c(dat_point$Long[1],dat_point$Lat[1]),c(dat_point$Long[2],dat_point$Lat[2]),n=100,addStartEnd = TRUE,sp=FALSE))
gg5<-gg4+geom_line(data=l,aes(x=lon,y=lat),color="white")
gg5
Till now we have managed to cover 6 of 7 steps in Visualization of Data.
But again the proces is not unidirectional.
routes<-tbl_df(readLines("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"))
routes<-separate(data = routes, col = value, into = c("Airline", "Airline_iD","Source_airport","Source_airport_id","Destination_airport","Destination_airport_id","Codeshare","Stops","Equipment"), sep = ",")
Refine aspect cover improving the visual features to clarify representation.
routes[ routes == "\\N" ] <- NA
Routes_source<-routes[,4]
names(Routes_source)<-"Airport_id"
Routes_destination<-routes[,6]
names(Routes_destination)<-"Airport_id"
Airport<-New_A_loc[,c(1,4,7,8)]
d1<-left_join(Routes_source,Airport,by="Airport_id")
d2<-left_join(Routes_destination,Airport,by="Airport_id")
Df<-cbind(d1,d2)
names(Df)<-c("Airport_id_in","Country_in","Lat_in","Long_in","Airport_id_out","Country_out","Lat_out","Long_out")
Df<-tbl_df(Df[complete.cases(Df),])
In<-subset(Df, Country_in =="India")
Out<-subset(Df,Country_out =="India")
## Incoming Flights
l<-gcIntermediate(cbind(In$Long_in,In$Lat_in),cbind(In$Long_out,In$Lat_out),n=100,addStartEnd = TRUE,sp=TRUE)
d_l<-SpatialLinesDataFrame(l,
data.frame(A_id_in = In$Airport_id_in,
A_id_out = In$Airport_id_out,
stringsAsFactors = FALSE))
d_l_df <- fortify(d_l)
gg6<-world+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")
gg6_t<-map+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")
# Outgong Flights
l<-gcIntermediate(cbind(Out$Long_in,Out$Lat_in),cbind(Out$Long_out,Out$Lat_out),n=100,addStartEnd = TRUE,sp=TRUE)
d_l<-SpatialLinesDataFrame(l,
data.frame(A_id_in = Out$Airport_id_in,
A_id_out = Out$Airport_id_out,
stringsAsFactors = FALSE))
d_l_df <- fortify(d_l)
gg6<- gg6+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")
gg6+coord_cartesian(ylim = c(0,60),xlim=c(40,120))
Not exactly what I had in mind. But it somewhat tries to imitate what Aaron Koblin attempted.
I hope you liked my first attempt at writing blog post and most of all my attempt in trying to explain the basic framework for visualization. Let me know what you think about this.